Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

received SIGSEGV when GC #550

Closed
ghost opened this issue Mar 26, 2016 · 15 comments
Closed

received SIGSEGV when GC #550

ghost opened this issue Mar 26, 2016 · 15 comments

Comments

@ghost
Copy link

ghost commented Mar 26, 2016

I update my old addon to support nan latest, update nodejs to v4.4.1. Run a while, got SIGSEGV when GC. I use valgrind and get the reason:

> ==23752== Invalid free() / delete / delete[] / realloc()
==23752==    at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23752==    by 0xAB0235: v8::internal::Heap::FreeDeadArrayBuffers(bool) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAD903E: v8::internal::MarkCompactCollector::SweepSpaces() (in /home/ec2-user/node/bin/node)
==23752==    by 0xAE2DA7: v8::internal::MarkCompactCollector::CollectGarbage() (in /home/ec2-user/node/bin/node)
==23752==    by 0xA9965F: v8::internal::Heap::MarkCompact() (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB0F77: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB1518: v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xA5C3CF: v8::internal::Factory::NewJSTypedArray(v8::internal::ExternalArrayType) (in /home/ec2-user/node/bin/node)
==23752==    by 0xA5C8C5: v8::internal::Factory::NewJSTypedArray(v8::internal::ExternalArrayType, v8::internal::Handle<v8::internal::JSArrayBuffer>, unsigned long, unsigned long) (in /home/ec2-user/node/bin/node)
==23752==    by 0x8D3E12: v8::Uint8Array::New(v8::Local<v8::ArrayBuffer>, unsigned long, unsigned long) (in /home/ec2-user/node/bin/node)
==23752==    by 0xDE3AA8: node::Buffer::New(v8::Isolate*, char*, unsigned long) (in /home/ec2-user/node/bin/node)
==23752==    by 0x984EC7B: ReturnMaker<(DataType)2>::make(std::tuple<unsigned long, char const*, unsigned long> const&) (in /home/ec2-user/dmp-common-datacollector/node_modules/lmdb-queue/build/Release/lmdb-queue.node)
==23752==  Address 0x5c0ad30a is not stack'd, malloc'd or (recently) free'd
==23752==
==23752== Invalid free() / delete / delete[] / realloc()
==23752==    at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23752==    by 0xAB0815: v8::internal::Heap::Scavenge() (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB0D23: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB143C: v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xA5217B: v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) (in /home/ec2-user/node/bin/node)
==23752==    by 0xC839DD: v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) (in /home/ec2-user/node/bin/node)
==23752==    by 0x15A827A06354: ???
==23752==    by 0x15A827C97B00: ???
==23752==    by 0x15A827A09FF6: ???
==23752==    by 0x15A827C72FC1: ???
==23752==    by 0x15A827A09FF6: ???
==23752==    by 0x15A827BFCFD4: ???
==23752==  Address 0x5ed7b908 is not stack'd, malloc'd or (recently) free'd
==23752==
==23752== Invalid free() / delete / delete[] / realloc()
==23752==    at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23752==    by 0xAB01C4: v8::internal::Heap::FreeDeadArrayBuffers(bool) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAD903E: v8::internal::MarkCompactCollector::SweepSpaces() (in /home/ec2-user/node/bin/node)
==23752==    by 0xAE2DA7: v8::internal::MarkCompactCollector::CollectGarbage() (in /home/ec2-user/node/bin/node)
==23752==    by 0xA9965F: v8::internal::Heap::MarkCompact() (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB0F77: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB1518: v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) (in /home/ec2-user/node/bin/node)
==23752==    by 0xAB1F20: v8::internal::Heap::HandleGCRequest() (in /home/ec2-user/node/bin/node)
==23752==    by 0xA4DD5B: v8::internal::StackGuard::HandleInterrupts() (in /home/ec2-user/node/bin/node)
==23752==    by 0xC8389A: v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) (in /home/ec2-user/node/bin/node)
==23752==    by 0x15A827A0963A: ???
==23752==    by 0x15A827C7C5AE: ???
==23752==  Address 0x5f8783b0 is not stack'd, malloc'd or (recently) free'd
==23752==

In my Code,

template<> Local<Value> ReturnMaker<BUFFER_TYPE>::make(const Consumer::ItemType& item) {
  return NewBuffer((char*)std::get<1>(item), std::get<2>(item)).ToLocalChecked();
}

Before update, I use nan 1.8.3,and the old code is:

template<> Local<Value> ReturnMaker<BUFFER_TYPE>::make(const Consumer::ItemType& item) {
    return NanNewBufferHandle(std::get<1>(item), std::get<2>(item));
}

I diff nan 2.2.0 and nan 1.8.3's NewBuffer:

2.2.0:

  NAN_INLINE MaybeLocal<v8::Object> NewBuffer(
      char *data
    , size_t length
#if NODE_MODULE_VERSION > IOJS_2_0_MODULE_VERSION
    , node::Buffer::FreeCallback callback
#else
    , node::smalloc::FreeCallback callback
#endif
    , void *hint
  ) {
    // arbitrary buffer lengths requires
    // NODE_MODULE_VERSION >= IOJS_3_0_MODULE_VERSION
    assert(length <= imp::kMaxLength && "too large buffer");
#if NODE_MODULE_VERSION > IOJS_2_0_MODULE_VERSION
    return node::Buffer::New(
        v8::Isolate::GetCurrent(), data, length, callback, hint);
#else
    return MaybeLocal<v8::Object>(node::Buffer::New(
        v8::Isolate::GetCurrent(), data, length, callback, hint));
#endif
  }

1.8.3:

  NAN_INLINE v8::Local<v8::Object> NanNewBufferHandle (uint32_t size) {
    return node::Buffer::New(v8::Isolate::GetCurrent(), size);
  }

I cann't find anything wrong.

My addon http://github.com/zwb-ict/lmdb-queue

In master branch, ForDebug dir:

Can run

node producer.js 

for ten min.

Then stop producer.js, just run

node consumer.js

When GC, SIGSEGV will be received .

Mac OS seems OK.
This issue is on Centos 7

Linux ip-172-31-10-243.eu-central-1.compute.internal 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

@bnoordhuis
Copy link
Member

Can you create a reduced stand-alone test case? Preferably something that's 50 lines or less.

@mathiask88
Copy link
Contributor

I think the bug is that your item's char in producer.create is allocated with new But if NanNewBuffer does not have a FreeCallback to define how you want to deallocate the memory nan uses free by default. You can read about that in the docs.

@kkoopa
Copy link
Collaborator

kkoopa commented Mar 26, 2016

Yes, either that, or a static address is being passed to the function and then free gets called on that. A further difference is that NewBuffer takes ownership of the pointer it was passed.

On March 26, 2016 12:12:55 PM GMT+02:00, "Mathias Küsel" notifications@github.com wrote:

I think the bug is that your item's char in producer.create is
allocated with new But if NanNewBuffer does not have a FreeCallback
to define how you want to deallocate the memory nan uses free by
default. You can read about that in the docs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#550 (comment)

@ghost
Copy link
Author

ghost commented Mar 26, 2016

Thank you @bnoordhuis, you may do like this:

My addon http://github.com/zwb-ict/lmdb-queue

In master branch, ForDebug dir:

Can run

node producer.js
for ten min.

Then stop producer.js, just run

node consumer.js
When GC, SIGSEGV will be received .

Mac OS seems OK.
This issue is on Centos 7

Linux ip-172-31-10-243.eu-central-1.compute.internal 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

I can reappear this issue in this way all the time.

@ghost
Copy link
Author

ghost commented Mar 26, 2016

Thank you @mathiask88 @kkoopa The producer and consumer are different process, The

template<> Local<Value> ReturnMaker<BUFFER_TYPE>::make(const Consumer::ItemType& item) {
  return NewBuffer((char*)std::get<1>(item), std::get<2>(item)).ToLocalChecked();
}

only used in Consumer.

And Consumer's ItemType is not the same with Produer's

class Consumer {
  public:
    typedef std::tuple<uint64_t, const char*, size_t> ItemType;
class Producer {
public:
    class ItemType {
    public:
        static ItemType create(size_t len);

    public:
        ItemType(char* mem, size_t len) : _mem(mem), _len(len), _shouldDelete(false) {
        }

Only used in Consumer, not Producer.

void Consumer::pop(BatchType& result, size_t cnt) {
    result.reserve(cnt);
    bool shouldRotate = false;

    {
        Txn txn(_topic->getEnv(), NULL);

        uint64_t phead = _topic->getProducerHead(txn);
        uint64_t head = _topic->getConsumerHead(txn, _name);

        if ((head - phead == 1) || phead == 0)
            return;

        int rc = _cursor->gte(head);

        if (rc == 0) {
            uint64_t offset = 0;

            for (; rc == 0 && cnt > 0; --cnt) {
                offset = _cursor->key<uint64_t>();
                const char* data = (const char*)_cursor->val().mv_data;
                size_t len = _cursor->val().mv_size;
                result.push_back(ItemType(offset, data, len));
                rc = _cursor->next();
            }

            if (offset > 0) {
                _topic->setConsumerHead(txn, _name, offset + 1);
                txn.commit();
            }
        } else {
            if (rc != MDB_NOTFOUND) cout << "Consumer seek error: " << mdb_strerror(rc) << endl;

            if (head <= _topic->getProducerHead(txn)) {
                shouldRotate = true;
            }
        }
    }

    if (shouldRotate) {
        rotate();
        pop(result, cnt);
    }
}

I create ItemType in this pop function, is there anything wrong in pop

     const char* data = (const char*)_cursor->val().mv_data;
     size_t len = _cursor->val().mv_size;
     result.push_back(ItemType(offset, data, len));

@kkoopa
Copy link
Collaborator

kkoopa commented Mar 26, 2016

As you seem to have noticed, you can use CopyBuffer to get the old behavior, but it is also slower due to doing a memory copy. If you cannot reasonably restructure your code to allow node to take ownership of the buffer and its life cycle, what you could do instead is pass a custom FreeCallback and appropriate hint to NewBuffer and take different actions in the FreeCallback dependent of the hint, for example set or check a flag referenced by the hint, which would let other objects know that the memory block has been freed.

@ghost
Copy link
Author

ghost commented Mar 26, 2016

yes, @kkoopa but It seems that use CopyBuffer or NewBuffer with an empty FreeCallback will cause memory leak!

with glace RES increase from 100M to 1G.

DISK I/O     R/s    W/s     58.1  18.0 5.65G 1.24G 31884 ec2-user     0 R   2:52.21  977K     0 ng-hw2dc.js
xvda1          0      0     56.6  17.2 5.61G 1.18G 31879 ec2-user     0 R   2:52.70    2M     0 ng-hw2dc.js
xvda2      4.97M    29K     55.6  16.2 5.57G 1.12G 31905 ec2-user     0 R   2:50.53  973K     0 ng-hw2dc.js
                            55.0   4.5 3.43G  303M   338 ec2-user     0 R   0:10.85    1M     0 ng-hw2dc.js

I'm not sure how to handle char *data , in my code, I get this from lmdb's B+ tree. I didn't think I should delete/free it . But now it seems cause memory leak! I really don't know how to fix this!

The pop function will call gte, which will call lmdb's mdb_cursor_get

    int gte(const MDB_val& k) {
        _key = k;
        return mdb_cursor_get(_cursor, &_key, &_val, MDB_SET_RANGE);
    }

The val's mv_data will pass to NewBuffer/CopyBuffer, and the val's mv_data's value comes from lmdb's
B+ tree. So I didn't new/malloc memory for mv_data, and I don't how how to handle it.

int
mdb_cursor_get(MDB_cursor *mc, MDB_val *key, MDB_val *data,
    MDB_cursor_op op)
{
    int      rc;
    int      exact = 0;
    int      (*mfunc)(MDB_cursor *mc, MDB_val *key, MDB_val *data);

    if (mc == NULL)
        return EINVAL;

    if (mc->mc_txn->mt_flags & MDB_TXN_BLOCKED)
        return MDB_BAD_TXN;

    switch (op) {
    case MDB_GET_CURRENT:
        if (!(mc->mc_flags & C_INITIALIZED)) {
            rc = EINVAL;
        } else {
            MDB_page *mp = mc->mc_pg[mc->mc_top];
            int nkeys = NUMKEYS(mp);
            if (!nkeys || mc->mc_ki[mc->mc_top] >= nkeys) {
                mc->mc_ki[mc->mc_top] = nkeys;
                rc = MDB_NOTFOUND;
                break;
            }
            rc = MDB_SUCCESS;
            if (IS_LEAF2(mp)) {
                key->mv_size = mc->mc_db->md_pad;
                key->mv_data = LEAF2KEY(mp, mc->mc_ki[mc->mc_top], key->mv_size);
            } else {
                MDB_node *leaf = NODEPTR(mp, mc->mc_ki[mc->mc_top]);
                MDB_GET_KEY(leaf, key);
                if (data) {
                    if (F_ISSET(leaf->mn_flags, F_DUPDATA)) {
                        rc = mdb_cursor_get(&mc->mc_xcursor->mx_cursor, data, NULL, MDB_GET_CURRENT);
                    } else {
                        rc = mdb_node_read(mc, leaf, data);
                    }
                }
            }
        }
        break;
    case MDB_GET_BOTH:
    case MDB_GET_BOTH_RANGE:
        if (data == NULL) {
            rc = EINVAL;
            break;
        }
        if (mc->mc_xcursor == NULL) {
            rc = MDB_INCOMPATIBLE;
            break;
        }
        /* FALLTHRU */
    case MDB_SET:
    case MDB_SET_KEY:
    case MDB_SET_RANGE:
        if (key == NULL) {
            rc = EINVAL;
        } else {
            rc = mdb_cursor_set(mc, key, data, op,
                op == MDB_SET_RANGE ? NULL : &exact);
        }
        break;
    case MDB_GET_MULTIPLE:
        if (data == NULL || !(mc->mc_flags & C_INITIALIZED)) {
            rc = EINVAL;
            break;
        }
        if (!(mc->mc_db->md_flags & MDB_DUPFIXED)) {
            rc = MDB_INCOMPATIBLE;
            break;
        }
        rc = MDB_SUCCESS;
        if (!(mc->mc_xcursor->mx_cursor.mc_flags & C_INITIALIZED) ||
            (mc->mc_xcursor->mx_cursor.mc_flags & C_EOF))
            break;
        goto fetchm;
    case MDB_NEXT_MULTIPLE:
        if (data == NULL) {
            rc = EINVAL;
            break;
        }
        if (!(mc->mc_db->md_flags & MDB_DUPFIXED)) {
            rc = MDB_INCOMPATIBLE;
            break;
        }
        if (!(mc->mc_flags & C_INITIALIZED))
            rc = mdb_cursor_first(mc, key, data);
        else
            rc = mdb_cursor_next(mc, key, data, MDB_NEXT_DUP);
        if (rc == MDB_SUCCESS) {
            if (mc->mc_xcursor->mx_cursor.mc_flags & C_INITIALIZED) {
                MDB_cursor *mx;
fetchm:
                mx = &mc->mc_xcursor->mx_cursor;
                data->mv_size = NUMKEYS(mx->mc_pg[mx->mc_top]) *
                    mx->mc_db->md_pad;
                data->mv_data = METADATA(mx->mc_pg[mx->mc_top]);
                mx->mc_ki[mx->mc_top] = NUMKEYS(mx->mc_pg[mx->mc_top])-1;
            } else {
                rc = MDB_NOTFOUND;
            }
        }
        break;
    case MDB_NEXT:
    case MDB_NEXT_DUP:
    case MDB_NEXT_NODUP:
        if (!(mc->mc_flags & C_INITIALIZED))
            rc = mdb_cursor_first(mc, key, data);
        else
            rc = mdb_cursor_next(mc, key, data, op);
        break;
    case MDB_PREV:
    case MDB_PREV_DUP:
    case MDB_PREV_NODUP:
        if (!(mc->mc_flags & C_INITIALIZED)) {
            rc = mdb_cursor_last(mc, key, data);
            if (rc)
                break;
            mc->mc_flags |= C_INITIALIZED;
            mc->mc_ki[mc->mc_top]++;
        }
        rc = mdb_cursor_prev(mc, key, data, op);
        break;
    case MDB_FIRST:
        rc = mdb_cursor_first(mc, key, data);
        break;
    case MDB_FIRST_DUP:
        mfunc = mdb_cursor_first;
    mmove:
        if (data == NULL || !(mc->mc_flags & C_INITIALIZED)) {
            rc = EINVAL;
            break;
        }
        if (mc->mc_xcursor == NULL) {
            rc = MDB_INCOMPATIBLE;
            break;
        }
        {
            MDB_node *leaf = NODEPTR(mc->mc_pg[mc->mc_top], mc->mc_ki[mc->mc_top]);
            if (!F_ISSET(leaf->mn_flags, F_DUPDATA)) {
                MDB_GET_KEY(leaf, key);
                rc = mdb_node_read(mc, leaf, data);
                break;
            }
        }
        if (!(mc->mc_xcursor->mx_cursor.mc_flags & C_INITIALIZED)) {
            rc = EINVAL;
            break;
        }
        rc = mfunc(&mc->mc_xcursor->mx_cursor, data, NULL);
        break;
    case MDB_LAST:
        rc = mdb_cursor_last(mc, key, data);
        break;
    case MDB_LAST_DUP:
        mfunc = mdb_cursor_last;
        goto mmove;
    default:
        DPRINTF(("unhandled/unimplemented cursor operation %u", op));
        rc = EINVAL;
        break;
    }

    if (mc->mc_flags & C_DEL)
        mc->mc_flags ^= C_DEL;

    return rc;
}

@Venemo
Copy link

Venemo commented Mar 26, 2016

@zwb-ict I'm also writing a node module based on LMDB. The data returned by LMDB is owned by LMDB and you should not try to free it yourself, because that would result in a SIGSEGV.

I had a similar issue and also thought that the empty FreeCallback causes a memory leak but it is most probably something else in your code.

@ghost
Copy link
Author

ghost commented Mar 26, 2016

Thank you @bnoordhuis @mathiask88 @kkoopa This issue has solved.

I add an empty FreeCallBack as NewBuffer's param.

The pointer which passed to NewBuffer is from lmdb's B+ tree. So It shouldn't be free/delete.

@kkoopa The memory leak is caused by http's keep-alive. (setup too many).

@ghost
Copy link
Author

ghost commented Mar 26, 2016

Thank you @Venemo, you are right. I found the memory leak's real reason is my program opens too many keepalive sockets.

@ghost ghost closed this as completed Mar 26, 2016
@Venemo
Copy link

Venemo commented Mar 26, 2016

@zwb-ict By the way, if you have questions about how to deal with LMDB, you can join us in the #openldap channel on Freenode or use the openldap-devel mailing list to ask. The authors of LMDB have been very helpful to me too.

@ghost
Copy link
Author

ghost commented Mar 26, 2016

@Venemo That's great! Happy to join you!

@ghost
Copy link
Author

ghost commented Mar 27, 2016

@Venemo @bnoordhuis @mathiask88 @kkoopa Would you please review my code if you have time? I'm a junior engineer, and want to write code better. Please Let me know my bad design and implement.
Thank you very much!
http://github.com/zwb-ict/lmdb-queue

I want to impl a simple queue, which is similar with kafka, but lightweight.

The meta db used to record procuder and consumers

Topic1 (db)

Key Value
Conumser_%s nextOffset
Consumer_%s nextOffset
Consumer_%s nextOffset
Producer lastOffset
dataDBFileIndex maxoffset
dataDBFileIndex + 1 maxoffset
dataDBFileIndex + 2 maxoffset
dataDBFileIndex + 3 maxoffset
... ... ... ...
dataDBFileIndex + n maxoffset

The data db used to record real data.

data file1 (db)

Key Value
offset data(bin)
offset + 1 data(bin)
offset + 2 data(bin)
offset + 3 data(bin)
offset + 4 data(bin)
... ... ... ...
offset + n data(bin)

The Consumer_%s is string,dataDBFileIndex is uint32, and offset is uint64

So my compare func looks like:

int descCmp(const MDB_val *a, const MDB_val *b) {
    /* DESC order in size */
    if (a->mv_size > b->mv_size) return -1;
    if (a->mv_size < b->mv_size) return 1;

    switch (a->mv_size)
    {
        case sizeof(uint32_t) :
            return mdbIntCmp<uint32_t>(a, b);
        case sizeof(uint64_t) :
            return mdbIntCmp<uint64_t>(a, b);
        default:
            return memcmp(a->mv_data, b->mv_data, a->mv_size);
    }
}

template<typename INT_TYPE> int mdbIntCmp(const MDB_val *a, const MDB_val *b) {
    INT_TYPE ia = *(INT_TYPE*)a->mv_data;
    INT_TYPE ib = *(INT_TYPE*)b->mv_data;
    return ia < ib ? -1 : ia > ib ? 1 : 0;
}

My email is zwb.ict@gmail.com
Twitter: zwb-ict

@Venemo
Copy link

Venemo commented Mar 27, 2016

@zwb-ict First I have two suggestions for you:

  • Try to write comments in your code explaining what it does and how it works. Currently you have 1271 lines of code, and only 7 comments, none of which explains anything.
  • Instead of posting random fragments of explanation and code in the comments here, please improve your own documentation in your readme file. For a successful project it is important to document both the prupose of your project (what goal are you trying to achieve and why), and the implementation concept (how you are achieving your goal and why).

Nobody likes to read undocumented code.

Finally, this isn't the right place to ask for code reviews. Some people would even consider it rude. But since you look like a beginner, let me point you to a place for this.

@ghost
Copy link
Author

ghost commented Mar 27, 2016

@Venemo Thank you very much and I'm so sorry to bother you.
From now on, I'll improve as you told me. It's really very helpful to me!
Thank you!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants